Jump to content

Wikipedia:Reliability of GNIS data

From Wikipedia, the free encyclopedia

Wikipedia has thousands of "populated place" stubs which were mass-created from the United States government's Geographic Names Information System (GNIS) database. Unfortunately, a major flaw has been found in this source: GNIS has labeled many locations as "populated places" in error rather than as a locale or another more accurate category. There are countless instances of discrepancies between the GNIS and print versions of the National Gazetteer, a publication of the USGS with the same entries. This means that everything from small homesteads to railroad junctions to river crossings have been mislabeled as "populated places".

Feature classes

[edit]

Geographic Names Information System is the official repository for place names in the United States, with a database of over 2 million natural and man-made features.[1] Entries are compiled from sources such as atlases, gazetteers and topo maps.

Each place is assigned an official name and a "feature class" such as Park, School, Dam, Populated Place or Locale. Locale is meant to encompass miscellaneous human-made features such as battlefields, campgrounds, farms, railroad sidings, windmills, etc. However, since the topo maps that provide the bulk of GNIS entries do not clearly distinguish between locale-type features and cities/towns/villages/hamlets, many of these were incorrectly transcribed as "populated places", a label that is supposed to apply to "... a named community with a permanent human population, usually not incorporated and with no legal boundaries, ranging from rural clustered buildings to large cities and every size in between." That's right: Many of our "populated place" articles are only labelled as such because an employee poring over a map missed a subtle difference in typeface.

It's difficult to prove that there was never a human settlement at a given location, but in many cases it's been found that the place name has only been used in conjunction with a railroad siding, ranch, windmill or other feature. For example, Haberman, NY was the location of a train station built to serve the Haberman Manufacturing Company in Queens, and the USGS employee who added the location to the database failed to recognize the subtle difference in spacing which was used to distinguish a train station from a community on the topo map. This particular error doesn't seem to have been repeated by Wikipedia since we already had a Haberman station article based on a different source, but it did appear in other GNIS-derived sources such as Google Maps.[2]

Propagation of errors

[edit]

Errors quickly propagate to other online sources which rely on GNIS for location data. Our AfD for Jolly Dump, South Dakota shows that it was never anything more than a place where railroad cars were loaded and unloaded, yet a Google search brings up the "Things to do in Jolly Dump" Facebook page, a list of nearby FedEx locations, a "Populated Place Profile" with coordinates and elevation copied from GNIS, nearby hotels ("lastminute.com has a fantastic range of hotels in Jolly Dump, with everything from cheap hotels to luxurious five star accommodation available!"), a weather forecast and daylight savings time information. Although this type of coverage is sometimes presented as evidence of notability, they don't meet our "significant coverage" requirement since they're simply copied from another source by an automated program. Wikipedia also forms a link in this chain of errors: When we describe a place as an "unincorporated community", a label that is often completely unsourced, Google Maps copies it as a description of the place.

GNIS itself has been found to propagate questionable information from other sources. Most entries were taken from USGS topographic maps at the smallest scale (1:24000 or 1:25000), but we have also found entries copied from NOAA navigational charts, from Forest Service maps, from promotional maps, from Rand-McNally atlases, from books of place names, and even from a philately journal, as well as items copied from larger scale topographic maps. One can readily deduce that these entries are not reflected in the small-scale topographic maps, which already adds an element of doubt; in the case of the nautical charts, which can be verified online, we have found that the charts were sometimes misread and sometimes bore name labels on shore which could not be reconciled with other maps. Promotional maps tend to list non-notable subdivisions; other sources report 4th class post offices, which were typically just a place in a store or railroad station or even a private residence where people could come to post and pick up their mail.

Official standards

[edit]

Although GNIS provides the official name of a place, the "feature class" labels do not carry the same official standing. They're simply used for "efficient data search and retrieval purposes" and "have no status as standards".[1] In fact, GNIS specifically does not involve itself in such geographic minutiae as the differences between hills and mountains, lakes and ponds or rivers and creeks.[3] As editors we need to be aware of the purpose and shortcomings of GNIS, using it as a resource where it excels (name and coordinates) while relying on other sources for notability and feature type. After all, our research and editorial discretion is what distinguishes Wikipedia from machine-generated gazeteers such as Hometown Locator.

Feature classes abandoned in 2014

[edit]

In 2017 the USGS made this announcement:

Data Content: Since GNIS staff has been unable to maintain Domestic administrative names for quite some time (since October 1, 2014), these records will be archived from GNIS database and will longer be available through the GNIS search application. The following feature classes will be archived: Airport, Bridge, Building, Cemetery, Church, Dam, Forest, Harbor, Hospital, Mine, Oilfield, Park, Post Office, Reserve, School, Tower, Trail, Tunnel, and Well.

Wikipedia articles bulk-added in earlier years based upon these archived records now link to blank records on the https://edits.nationalmap.gov/apps/gaz-domestic/public/search/names interface to the "gaz-domestic" (NGNDB) database.

Reliability of locations

[edit]

While the GNIS entries are generally considered accurate, pace several AFD discussions where discussion has been derailed by what turned out to be a single-digit typing error on the part of a data entry clerk, they may not be appropriate. This is because Wikipedia has different rules to the GNIS rules.

  • Per Wikipedia:WikiProject Geographical coordinates/Linear Wikipedia wants the mid-point of linear features. However, the rules for the GNIS data compilation were that the primary coördinate be the "mouth" of the feature and secondary coördinates be any point on the feature as long as it indicated what (other) map(s) the feature crossed.[4][5]
  • Per Wikipedia:WikiProject Geographical coordinates#Which coordinates to use Wikipedia wants the centres of towns and cities. In Payne's own word in the USGS report on GNIS phase 1, the selection of a coördinate for a big town or city is "subjective", and the GNIS rule was, in contrast, to pick a prominent civic feature (town hall, main intersection, main public library, and so forth) rather than attempt a geometric centre.[4]
  • While in phase 1 coördinates were read straight from the markers on the maps, in phase 2 coördinates were interpolated, using contour lines.

Further complicating this is that there were alternative forms of the database that substituted coördinate information from the National Map database.

FAQ

[edit]
  • Q: Aren't government sources always reliable?
  • A: They're generally accurate, but like any reliable source they're susceptible to errors.
  • Q: What's the harm in keeping these stubs?
  • A: Wikipedia is a trusted source that many organizations rely on. For example, some of these places appear on Google Maps with descriptions such as "Jones Windmill is an unincorporated community in Smith County", even though the "unincorporated community" designation has never appeared in a reliable source - it was applied by a Wikipedia editor, based on their own interpretation of an erroneous "populated place" label. When we keep these stubs, we play an active role in creating and propagating false information.
  • Q: But it returned 6,000 Google search results - There's even a FedEx office there!
  • A: Many websites use GNIS for automated location data. When you search for real estate listings, store locations or weather reports, the name is used to mark a point on a map and return the requested information. The source isn't saying that the location is notable, probably doesn't do business there and most likely isn't even aware of its existence.
  • Q: If it's listed in GNIS, wouldn't that make it a "populated, legally recognized place" and therefore presumed notable per WP:GEOLAND?
  • A: According to the USGS, "populated place" is a designation for places that are generally not legally defined or recognized: "An entry with Feature Class = Populated Place represents a named community with a permanent human population, usually not incorporated and with no legal boundaries, ranging from rural clustered buildings to large cities and every size in between. The boundaries of most communities classified as Populated Place are subjective and cannot be determined." Wikipedia doesn't have a specific definition of what qualifies as a "legally recognized populated place", but repeated discussions have concluded that simply being listed in a government database or appearing on a map does not meet the requirement.

Relevant AfDs

[edit]

To illustrate the range of misidentified places, here is a list of AfD discussions of GNIS "populated places":

Further reading

[edit]

Cleanup efforts

[edit]

Books to check against

[edit]

There are usually Arcadia Publishing books for a particular locality. Arcadia books are not the be-all-and-end-all, but they do point the way and are generally the results of local historians already having done for us the poring over old maps, records, and photographs. Arcadia (and other local history) books helped sort out Robert, California (AfD discussion) and Escalle, Larkspur, California; helped identify what Salminas Resort, California (AfD discussion) actually was; and conversely made the cases stronger against the likes of Ettawa Springs, California (AfD discussion). All of these were two-sentence GNIS-only stubs at the time of deletion nomination, all claiming "unincorporated community".

Gazetteers
These are useful for telling whether an "unincorporated community" that is just a dot nowadays is a historical post-town/post-village or only a post office; that then might be found in local county/state histories. Lippincott's, in particular, has a uniform scheme for this. Take care about dates, of course.
  • Baldwin, Thomas; Thomas, Joseph (1855). Lippincott's pronouncing gazetteer. Philadelphia: J.B. Lippincott. hdl:loc.gdc/gdclccn.tmp96023479. LCCN tmp96023479.
  • Lippincott's gazetteer of the world. Philadelphia: J. B. Lippincott & co. 1880. hdl:loc.gdc/scd0001.00193145826. LCCN 02002832. OL 24447594M. (Lippincott's gazetteer of the world at the Internet Archive)
Books of place names
In many states people took it upon themselves to identify the origins of the names of places within the state. These vary in quality but have often helped to clarify matters by giving a more specific characterization of the places in question. We have found these used as GNIS sources, often quite badly.
Old local histories
As with the place names books, quality is variable, and those from around 1900 tend to be a bit gushing in their praises of the forefathers and heavy on the anecdotes. That said, their age (typically with a few decades of the foundation of the places, at least outside the east coast) and attention to detail can help resolve matters.

References

[edit]
  1. ^ a b "Principles, Policies and Procedures" (PDF). Reston, VA: United States Board on Geographic Names, Domestic Names Committee. December 2016.
  2. ^ Schultz, Isaac (15 October 2019). "The Brief, Baffling Life of an Accidental New York Neighborhood". Atlas Obscura. Retrieved 9 May 2020.
  3. ^ "How Do I?". U.S. Board on Geographic Names. United States Geological Survey. Retrieved 9 May 2020.
  4. ^ a b Payne 1983, p. 5.
  5. ^ Payne 1985, p. 7.
  6. ^ Swanson 2014, p. 195.
  7. ^ Manca 2012, pp. 161–162.
  8. ^ USBLM, p. 106.
  9. ^ Harriman v. Brown, 8th Leigh. 697.
  10. ^ "The Admissibility of Hearsay Evidence in Boundary Disputes". Columbia Law Review. 9 (3): 255–257. March 1909. doi:10.2307/1109094. JSTOR 1109094.
  • Payne, Roger L. (1983). McEwen, Robert B.; Winter, Richard E.; Ramey, Benjamin S. (eds.). Geographic Names Information System (PDF). Geological Survey Circular. United States Geological Survey. 895-F.
  • Payne, Roger L. (1985). Geographic Names Information System: Data Users Guide (6 ed.). Reston, Virginia: United States Geological Survey.
  • Swanson, Drew A. (2014). A Golden Weed: Tobacco and Environment in the Piedmont South. Yale University Press. ISBN 9780300191165.
  • Manca, Joseph (2012). George Washington's Eye: Landscape, Architecture, and Design at Mount Vernon. JHU Press. ISBN 9781421405612.
  • Manual of Instructions for the Survey of the Public Lands of the United States. Technical bulletins. Vol. 6. United States Bureau of Land Management.

See also

[edit]